Extracting Tree Adjoining Grammars from Bracketed Corpora
نویسندگان
چکیده
Fei Xia Department of Computer and Information Science University of Pennsylvania 3401 Walnut Street, Suite 400A Philadelphia PA 19104, USA [email protected] Abstract In this paper, we report our work on extracting lexicalized tree adjoining grammars (LTAGs) from partially bracketed corpora. The algorithm rst fully brackets the corpora, then extracts elementary trees (etrees), and nally lters out invalid etrees using linguistic knowledge. We show that the set of extracted etrees may not be complete enough to cover the whole language, but this will not have a big impact on parsing.
منابع مشابه
Automatically Extracting and Comparing Lexicalized Grammars for Different Languages
In this paper, we present a quantitative comparison between the syntactic structures of three languages: English, Chinese and Korean. This is made possible by first extracting Lexicalized Tree Adjoining Grammars from annotated corpora for each language and then performing the comparison on the extracted grammars. We found that the majority of the core grammar structures for these three language...
متن کاملAutomated Extraction of Tags from the Penn Treebank
The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to a...
متن کاملContextual Tree Adjoining Grammars
n rhi.\' pape1; 1i:e introduce a formalism called contextual tree adjoining grammar (CTAG). (::TAG.~ are a generalization of multi bracketed contextual reivriting gramnwrs (MBICR) which combine tree adjoini11g grammars (TAGs) and co11textual grammars. The generalization is to add a mechanism similar to obligatory adjoi11i11g in TAGs. Here, we present the definition o.f the model and some result...
متن کاملExtraction of Tree Adjoining Grammars from a Treebank for Korean
We present the implementation of a system which extracts not only lexicalized grammars but also feature-based lexicalized grammars from Korean Sejong Treebank. We report on some practical experiments where we extract TAG grammars and tree schemata. Above all, full-scale syntactic tags and well-formed morphological analysis in Sejong Treebank allow us to extract syntactic features. In addition, ...
متن کاملFrom Treebanks to Tree-Adjoining Grammars
Grammars are valuable resources for natural language processing. A large-scale grammar may incorporate a vast amount of information on morphology, syntax, and semantics. Traditionally, grammars are built manually. Hand-crafted grammars often contain rich information, but require tremendous human effort to build and maintain. As large-scale treebanks become available in the last decade, there ha...
متن کامل